IBIS Macromodel Task Group Meeting date: 03 September 2019 Members (asterisk for those attending): ANSYS: Dan Dvorscak * Curtis Clark Cadence Design Systems: * Ambrish Varma Ken Willis Kumar Keshavan Intel: Michael Mirmak Keysight Technologies: * Fangyi Rao * Radek Biernacki Ming Yan Stephen Slater Maziar Farahmand Mentor, A Siemens Business: * Arpad Muranyi Micron Technology: * Randy Wolff * Justin Butterfield SiSoft (Mathworks): * Walter Katz Mike LaBonte SPISim: * Wei-hsing Huang Teraspeed Labs: * Bob Ross The meeting was led by Arpad Muranyi. Curtis Clark took the minutes. -------------------------------------------------------------------------------- Opens: - Arpad noted that he had added two new items to the agenda. - 6) Enabling Backchannel Interface in Statistical Mode - 7) BIRD198.1 draft review. ------------- Review of ARs: - Ambrish to send out a new draft of BIRD197.5 incorporating changes made during the meeting. - Done. -------------------------- Call for patent disclosure: - None. ------------------------- Review of Meeting Minutes: Arpad asked for any comments or corrections to the minutes of the August 27 meeting. Ambrish noted one comment from Walter that hadn't been captured in the minutes. The group had decided to remove this sentence from the previous draft: If DC_Offset is Usage In, the EDA tool may use the input value of DC_Offset to post process the data returned by the AMI model. During the discussion surrounding this sentence, Ambrish recalled that Walter had said, "we [EDA vendors] all agree... [on what to do with DC_Offset as an In]". Walter said he thought he'd said something like, "We [EDA vendors] all agree on what we should be doing with DC_Offset as an input and as an output." Walter noted that he thinks we all do agree. Ambrish agreed, but he noted that he had only been willing to remove that sentence because he believed there was general consensus that it was clear to everyone that if DC_Offset were Usage In the tool could add the input value to the Rx GetWave() output waveform if it chose to do so. Ambrish said with this comment recorded here he was happy with the previous minutes. Ambrish moved to approve the minutes. Walter seconded the motion. There were no objections. ------------- New Discussion: BIRD198.1: As he had proposed doing at the previous meeting, Arpad presented a summary of the new draft he, Bob, Randy, and Mike L. had received from the authors. He noted that it was a well written and thorough BIRD. It proposed a few keywords to describe an RC circuit between power and ground rails on the die. The new version incorporated some language to allow for the coexistence with IBIS 7.0 (BIRD189) interconnect model syntax. It provided a truth table defining various combinations with BIRD189 that were illegal conflicts, combinations that were syntactically legal but could give questionable results if not done properly, and combinations that were straight-forward and had no problems. Arpad noted that he understood that the goal of BIRD198.1 was simplicity, but he expressed concern about adding a new way to do something that could be handled with BIRD189 syntax. Walter noted that he had also reviewed the new draft. He said he thought they wanted to do something very simple, but it had become overly complicated. He said he thought they really wanted to define an on-die circuit for every rail node. He thought it might be handled more easily by creating a new on-die decap keyword, instead of a new model type. Walter said he thought the simple circuit topology in BIRD198.1 might be useful to have for power modeling, and BIRD189 could be used for signal interconnect modeling. Walter thought the simple circuit concept was useful, but the proposed syntax was too confusing. He said he didn't understand the [Model Selector]s in the proposal. Walter said he would draft a simpler proposal using the two resistors and a capacitor, but strictly between Pin signal_names. Bob noted that the current proposal uses bus_labels too. Walter said he thought the authors included bus_labels because BIRD189 did, and he thought it was making the proposal more complicated than was necessary to accomplish their goal. Arpad noted one other potential logistical issue. The proposal introduced a new Model_type, but it may not have addressed all the associated issues, for example, Sub-Params like C_comp that are currently required for all [Model]s. Bob noted that existing series model types were already different than other model types [the spec states that C_comp is ignored for series models]. Arpad said this was the type of explanation that might be needed for this proposal too. Walter said he would propose another option instead. Walter took an AR to draft an alternate proposal that he thought would meet the authors' requirements with a simpler syntax. Bob said not to rule out bus_labels arbitrarily. He said that might be the best solution, and that's how it's tied in with the I/O buffers. Walter agreed and suggested we review the options once he produces his alternate proposal. Enabling Back-channel in Statistical Mode: Walter shared a presentation introducing the topic. Walter noted that rather than draft a BIRD and bring it for review, he wanted to introduce the topic and issues and then have brainstorming sessions to resolve the questions and build the solution together. - Justification (slide 3) - Requested by IC vendors and users. - Personal experience developing optimization algorithms for DDR5. - Desire to move the optimization algorithms into Tx and/or Rx models in an IBIS compliant way. - Enable all EDA companies to support BCI optimization in Statistical and/or Time Domain. - How to do it? (slides 4-6) - Will require a new function similar to AMI_Init(). - Function name is yet to be decided. - Function won't allocate memory, it will use the handle allocated by Init(). - Protocol will determine the information transferred between Tx and Rx .dlls (and any repeaters in the middle). - New Reserved parameter "BCI_Training_Type" to indicate if BCI Statistical mode is supported. (e.g., "GetWave", "Both", "Statistical(name TBD)") - BCI_parameters_in and BCI_parameters_out as new function arguments. - Like Init(), the new function takes an IR in and returns an IR. - Communication between the .dlls. (slide 7) - Could use file I/O as in BIRD147. EDA tool would stay out of the BCI communication path itself and just call the new function(s) in the Tx and Rx iteratively. - Could use strings generated by the models and passed back and forth via the BCI_parameters_in and BCI_parameters_out arguments. - Other suggestions? - Statistical Training Flow (slide 8) - EDA tool will alternately call this new function in the Tx and Rx if: - Training is On - BCI Statistical is allowed and enabled on the Tx and Rx. - Training stops when "converged" (or a failure occurs). - Results from last call to the function should be used for statistical analysis. Ambrish asked how many back-and-forth iterations one might expect during the training phase in Statistical flow. Walter said he'd had a good bit of fun working on a DDR5 DQ write example using the same channel step response from his DesignCon presentations. So, with a real channel and hopefully real DDR5 buffer models, some of the optimization algorithms had taken 700 iterations to converge, and he'd gotten that down to 250, perhaps down to 70 with a good initial guess. So, he suggested on the order of several hundred to 1000 iterations, and said he could imagine convergence taking longer for more complicated channels. Ambrish noted that unlike the GetWave() bit-by-bit flow where behavior changes over multiple GetWave() blocks, here we only had a single IR in Statistical training. Couldn't the IR be analyzed in one or two iterations? Walter noted that if compute power were unlimited, in his DDR5 example one could take the 3 FFE taps, and the 4 DFE taps, each of which had 30 settings, and ignoring gain or other parameters still have (30)^7 combinations. But in practice you can't just use brute force and run every combination. So we need a way to start and head toward the optimum solution. Gradient search and many other algorithms could be used to avoid trying all the combinations. Try some settings, convert the modified IR to an eye or COM or some other metric and optimize based on that. Ambrish asked if all this would happen in the Rx, and noted that we can't legislate what the Rx needs to do to determine its optimal set-up. Walter noted that the DDR5 Rx model might be stupid. No DFE optimization, or CDR, it just expects to be told the tap weights and have the skew set. In DDR5 DQ Write protocol, it's the DDR5 controller (Tx) that is controlling the Rx. That's the way the real hardware works, and the memory model maker is going to want the controller vendor to write the optimization algorithms into the Tx model. Wei-hsing asked if this optimization could be done in the EDA tool rather than in the model. Walter said that the optimization algorithms that are developed will go into the Tx. We are enabling this flow to happen if we have the EDA tool alternately calling the new function in the Tx and Rx and letting them negotiate. Wei-hsing said the EDA tool wouldn't need to do the full sweep of all the combinations, and the EDA tool could control the optimization flow and call the new function with different settings and follow its own optimization path. Ambrish said the lesson from BIRD147 was that it would be hard to define that flow for the EDA tool in the spec, and it's easier to have the EDA tool be the conduit that allows the AMI Tx and Rx to communicate. Walter noted that EDA tools providing the optimization flow had been the only option. But what customers wanted was the optimization to be done by the appropriate model .dll and in a flow compliant with the IBIS standard. - How to proceed (slide 9) - Brainstorming session to: - Determine the name of the new function - Determine the communication mechanism - Determine new Reserved Parameters. - Develop a DDR5 DQ Write BCI Protocol - Invite additional memory and controller vendors. - Develop a Generic Tx N-tap FFE BCI Protocol. Walter noted that in real hardware the controller has a training process to determine the right value of VREFDQ to set in the buffer. Ideally it would be that same as the DC Offset, but in fact the DC_Offset has nothing to do with the register that sets VREFDQ. DC_Offset has to do with the step response at the input to the buffer, it's related to the single-ended waveform coming in to the receiver. But the Rx may not be set (register setting) to that voltage, and model makers might like to return the actual value of that reference voltage. Walter suggested that this might be a well-defined value to return as the output value of DC_Offset, so we might consider this in the DC_Offset BIRD. This was why he'd asked to table the DC_Offset BIRD discussion until we discussed this topic. Fangyi noted that individual Tx models would be training individual Rx models in this proposal. He said in the real world, multiple Txs and multiple Rxs might share the same settings, for example per nibble or per byte settings of the Txs and Rxs. He said this might be a limitation of this proposal if we can't force different Txs and Rxs to share the same settings. Walter said it was a possibility that you might want to train on one Tx/Rx pair and apply it to others. He said that thus far customers were asking for training of individual channels, but Fangyi had posed an interesting question. Fangyi said real systems were training per nibble or per byte (4DQ or 8DQ sharing the same settings). Walter asked if this was a requirement or just something that had been done. Justin said DRAM had the capability to set DFE taps per DQ, so that possibility should be covered. Walter said it might be likely that all the routing for a nibble would be identical, and you'd end up with the same outcome if you optimized them individually or as a group. Fangyi agreed this was possible. Fangyi said VREFDQ might be somewhat different, and that it was also shared by a nibble or byte lane. A particular register might be set that applied to an entire nibble. Walter agreed this was a possibility. Randy said he wasn't sure how much we could say since the spec hasn't been published, but perhaps it's a possibility there's per DQ VREFDQ adjustment as well. Walter said as a practical matter all the routing for a nibble will be almost identical, the models will be the same, the DC_Offset will be the same. Hardware may train them individually, or all 4 at once. If they're all the same, you only have to train on one. If there are differences in length, etc., then it becomes an interesting problem if you only have one set of settings for all 4. Fangyi noted that in practice the physical or electrical lengths can differ even within a byte lane. The timing skew can be calibrated per bit, which is an indication of different electrical lengths for bits within a byte lane. Walter agreed. Fangyi asked if this training is limited to using strictly statistical methods. Walter said it takes an IR in, and it returns an IR, but nothing would prevent the model from generating a time domain waveform to use during training. Fangyi agreed. Walter said the time domain waveform could be used to handle non-linearities, and even to prepare for future GetWave() simulation, but only the linearized version could be returned in the final IR. - Walter: Motion to adjourn. - Curtis: Second. - Arpad: Thank you all for joining. AR: Walter to draft his alternate proposal for BIRD198.1. ------------- Next meeting: 10 September 2019 12:00pm PT ------------- IBIS Interconnect SPICE Wish List: 1) Simulator directives